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DIFFERENTIAL ITEM PERFORMANCE FOR MEXICAN-AMERICAN 
ESL STUDENTS AND WHITE NON-ESL STUDENTS ON MATHEMATICS 
AND ENGLISH ACHIEVEMENT TESTS 

, Introduction 

Test bias, whether associated with race, sex, or other population 
subgroups, is a serious and highly complex issue. Test and item bias are 
conceptualized as something that invalidates the meaning of test results for 
some subgroup of the population. One subgroup of particular interest is 
"English as a second language" (ESL) students. Performance on admissions and 
placement tests may be affected in nontrivial ways by the English language 
proficiency of ESL students. 

ESL students represent many different ethnic groups. The culture and 
language of the different ethnic groups vary widely; therefore, it is 
preferable to study ESL students separately for each ethnic group. The 
present study examines the performance of Mexican-American ESL students. 

Several researchers have looked at the validity of standardized tests for 
Mexican-American ESL students in the context of an external criterion, viz., 
college performance (Alderman, 1982; Breland and Duran, 1985; Mestre, 1981). 
The results of these studies vary. Some found evidence of differential 
validity while others did not. Differences in such things as predictor 
variables, criterion variables, and sample composition make comparisons among 
these studies difficult. 

It is possible to think of validity in the absence of an external 
criterion. Internal analysis focuses on the group performance within a 
measure. In the fall of 1986, a pilot study was conducted that investigated 
the differential performance of Mexican-American ESL studentsfand white non- 
ESL students at the item level. The test materials used in the pilot study 
were the English Usage and Mathematics Usage tests of the ACT Assessment 
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administered in October 1985. A procedure developed by Mantel and Haenszel 
(1959) was used to examine differential item performance* 

For the English Usage Test, 10 of the 75 items were identified as 
performing significantly different for ESL and non-ESL examinees. ESL 
students were favored on five of the 10 items. When looking at the content 
classification of each of these ten items, there did not appear to be any 
systematic differences in the classification of items that favored ESL and 
non-ESL examinees. However, when examining all items in the test, both 
significant and nonsignificant, the white non-ESL students tended to perform 
relatively better cn logic and organization items, while the ESL students 
tended to perforn ^^elatively better on grammar items. 

For the MatheiTu*cics Usage Test, none of the items showed a significant 
difference between the Mexican-American ESL and white non-ESL examinees. In 
looking at the direction of the Mantel-Haenszal statistic for the different 
categories of math items, there again did not appear to be a systematic 
difference between ESL examinees and white non-ESL examinees. However, there 
seemed to be a slight tendency for the number of words in story problems to be 
related to the degree that the items seemed to favor the non-ESL examinees. 

The present study was d.3«igned to replicate the pilot study and test the 
hypotheses formed on the basis of that study. The hypotheses formed are 
listed below: 

1. Items that emphasize mechanics in the English Usage Test, such as 
grammar and punctuation, tend to favor ESL examinees. 

2. Items that focus upon style and structure in the English Usage Test 
tend to favor non-ESL students. 

3. Mathematical items with the greatest verbal load tend to favor non-ESL 
examinees. 
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The primary objective of the present study was to investigate English 
usage and mathematics items for differential item performance based on ESL and 
non-ESL examinees. A second objective of the study was to investigate 
specific hypotheses about the items with respect to differential item 
performance. 

Methodology 

Instrument and Subjects 

The test materials used in the present study were the English Usage and 
Mathematics Usage tests of the ACT Assessment, a college entrance exam. The 
English Usage Test is a 75-item, 40-minute test that measures understanding 
and use of basic elements of correct and effective writing: punctuation, 
grammar, sentence structure, diction and style, and logic and organization. 
The Mathematics Usage Test is a 40-item, 50-minute test that measures 
mathematical reasoning ability in six content areas. See Table 1 for a list 
and description of the item categories in each test. 

The samples of 471 Mexican-American, self-reported ESL students and 1000 
white self -reported non-ESL students were taken from the October 1986 ACT 
Assessment administration. All Mexican- American ESL students who took the ACT 
Assessment in October 1986 were included in the study. The 1000 white non-ESL 
students were randomly selected from the group of 160,220 white non-ESL 
examinees who took the ACT Assessment on that same date. 



Insert Table 1 about here 



ERLC 



5 



Index of Differential Item Performance 

A contingency table procedure was used to measure differential item 

performance (Mantel & Haenszel, 1959). The Mantel-Haenszel statistic (MH- 

CHISQR, see Holland and Thayer, 1986) is based upon 2x2 contingency tables 

for each total score category. The MH-CHISQR statistic is distributed as a 

chi-square with one degree of freedom and is therefore a powerful unbiased 

test (Cox, 1970). Two statistics related to the MH-CHISQR, a,,„ and i,„, were 

MH MH 

also examined. The common odds ratio, a^^, across the 2 x 2 tables, is given 

by 



MH 



Where Tj is the total number of examinees in the jth matched set. Aj and Cj 
represent the number of examinees in the reference and focal groups who 
answered an item correctly. Bj and Dj are the number of examinees who 
responded incorrectly from the reference and focal groups. The reference 
group establishes a standard against which the performance of the focal group 
is compared. This ratio is on a scale of 0 to « with a = 1 representing a 
null value or no differential item performance. 

The value of o^^^, for a studied item, is the "average factor by which the 
odds that a member of the reference group is correct on the studied item 
exceeds the corresponding odds for a comparable member of the focal group" 
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(Holland and Thayer, I986). Holland and Thayer suggest taking the log of a„„ 

MH 

to put it into a symmetric scale with zero as the null value. We propose a 
slight modification of this procedure, 

^MH = -177 ^" ^"mH^' 

as a measure of the amount of differential item performance. 

The value of z^^^ is a measure of the degree to which a white non-ESL 
examinee found the studied item more difficult than did a comparably-scoring 
ESL examinee. Positive values imply thst the ESL examinees found the item 
relatively easier than the white non-ESL examinees; negative values indicate 
that ESL examinees found the item rsJ^atively harder. 

Results 

Means, standard deviations, and sample size for each group are presented 
in Table 2. Mean raw scores and standard deviations are lower for Mexican- 
American ESL examinees than for white non-ESL examinees for both tests. 



Insert Table 2 about here 



Figures 1 and 2 present the cumulative frequency polygons for the 
Mexican-American ESL sample and the white non-ESL sample for the English Usage 
Test and the Mathematics Usage Test, respectively. 



Insert Figures 1 and 2 about here 
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In Table 3, tho z statistics from the M-H analysis, comparing Mexican- 
American ESL examinees and white non-ESL examinees, are reported for the 
mathematics and English items. A baseline for judging the magnitude of the 
Zj^pi statistic was obtained from an analysis which compared two randomly 
equivalent groups of 1,000 white non-ESL examinees (see Shepard, 1984). Index 
values that exceeded the 1 rgest value occurring in the white-white analysis 
(.20 for English Usage and .23 for Mathematics) are starred in Table 3 as 
performing differentially. 



Insert Table 3 about here 



A substantial number of items at the end of the English test were flagged 
as easier for white non-ESL examinees as compared to Mexican-American ESL 
examinees. Figure 3 displays the magnitude and direction of the statistic 
pictorially with the 75 items grouped according to passage set. As can be 
seen, the last two passages contain more items which favor white non-ESL 
examinees. We speculated that this was due to a differential speededness 
effect between white non-ESL and Mexican-American ESL examinees. Since items 
omitted by examinees do not enter into our computation of the M-H statistics, 
we further speculated that the speededness effect was showing up because 
Mexican-American ESL examinees, in an effort to finish the test, may have 
randomly answered the last items. In an effort to reduce this possible 
differential speededness effect on the English Test, the M-H analysis was 
again dene on the English Test, this time with the last two passage sets 
omitted, and thereby reducing the number of items from 75 to 58. Table 4 
shows the items flagged in the 58-item English Test using the index value 
obtained in the white-white comparison as the criterion. (The criterion index 

8 

o 

ERIC 



7 

value was unchanged since there was no evidence of a speededness effect for 
the white non-ESL examinees.) 



Insert Table 4 about here 



Results for the English Usage Test 

As can be seen in Table 4, seven items were flagged as performing 
differentially. Three of the items were found to be relatively easier for 
Mexican- American E3L examinees and four were found to be relatively easier for 
the white non-ESL examinees. Each of the three items found to favor the 
Mexican-American ESL examinees were classified as diction and style items. 
For the four items found to favor the white non-ESL examinees, one was 
classified as a punctuation item, two were sentence structure items, and one a 
diction and style item. Looking only at the seven items flagged as performing 
differentially, no conclusive evidence of any systematic differences in the 
classification of items that favored ESL and non-ESL examinees could be 
found. There was, at most, only a hint of a tendency for sentence structure 
items to favor non-ESL examinees and for diction and style items and grammar 
items to favor ESL examinees (see Figure 4). 



Insert Figure 4 about here 
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Results for the Mathematics Usage Test 

For the Mathematics Usage Test, only two items were flagged as performing 
differentially (see Table 3). Both of these items were arithmetic and 
algebraic reasoning items and both favored white non-ESL examinees. When all 
items in the test were examined, both significant and nonsignificant, there 
appeared to be no systematic differences in ihe content classification of 
items favoring ESL examinees and white non-ESL examinees (see Figure 5). 

Items were also classified according to their verbal load. Figure 6 
shows the magnitude and direction of the M-H Z statistic for all items 
categorized as either, (1) equations only (no words), (2) standard word count^ 
less than and (3) standard word count greater than or equal to l\0. In 
general, the hypothesis that high word-count items favor non-ESL students was 
not supported. However, the two items with the largest index values were high 
word-count items that did favor the non-ESL students. 



Insert Figures 5 and 6 about here 



Discussion and Conclusions 

The purpose of this research was to replicate the pilot study and test 
the hypotheses formed on the basis of the pilot study. Hypothesis 1, which 
stated that items emphasizing mechanics (such as grammar and punctuation) in 



^ standard word count here is defined as the number of characters in an item 
stem divided by 6. 
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the English Usage Test tend to favor ESL examinees, was not supported. 
Although more grammar items tended to favor ESL examinees than non-ESL 
examinees, the magnitude of the z^^ statistic was less than .15 for each 
item. Also, items in the punctuation classification had a slight tendency to 
favor non-ESL examinees— the opposite of what was hypothesized. 

Hypothesis 2, which stated that items that focus upon style and structure 
in the English Usage Test tend to favor non-ESL students, was not supported by 
the present research. Although sentence structure items seemed to favor non- 
ESL examinees, items classified as diction and style seemed to favor ESL 
examinees as seen in Figure 4, and items classified as logic and organisation 
did not seem to favor either ,^roup. 

Hypothesis 3, which stated that the verbal load of the math items is 
related to differential item performance for ESL and non-tiSL examinees, was 
also not strongly supported. However, the two items that were flagged as 
favoring white non-ESL examinees were items with high word counts. 

In summary, the results do not provide support for the specific 
hypotheses that were the focus of this study. Although the mean score for the 
Mexican-American ESL students was almost a full standard deviation below that 
of the non-ESL students for both tests, it appears that the group difference 
in performance was reflected throughout most of the items in the test. Both 
tests seemed to be functioning comparably for each of the investigated groups 
of examinees. We were unable to find specific categories of items that were 
disproportionately easy or hard for either of the groups. 
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Table 1 



Content Categorler of the ACT Assessment 
English Usage and Mathematics Tests 



English Usage 



Punctuation. The Items In this category test such conventions as the use and placement of 
commas, colons, semicolons, dashes, parentheses, apostrophes, and quotation, question, and 
exclamation marks • 

Granoar. The Items In this category test adjectives and adverbs, conjuctlons, and agreement 
between subject and verb and between pronouns and their antecedents. 

Sentence Structure. The Items In this category test relationships between/among clauses, 
placement of modifiers, parallelisms, and shifts In construction. 

Diction and Style. The Items In this category vest precision in word choice, appropriateness 
In figurative language, and economy In writing. 

Logic and Organization. The Items In this category test the loglcal organization of Ideas: 
paragraphing, transitions, unity, and coherence. 



Arithmetic and Algebraic Operations. The Items In this category explicitly describe 
operations to be performed by the student. The operations Include manipulating and 
simplifying expressions containing arithmetic or algebraic fractions, perfonr.Ing basic 
operations in polynomials, solving linear equations In one unknown, and performing operations 
on signed numbers. 

Arithmetic and Algebraic Reasoning. These word problems present practical situations In 
which algebraic and/or arithmetic reasoning Is required. The problems require the student to 
Interpret the question and either to solve the problem or to find an approach to Its 
so 1 ut I on . 

Geoaetry. The items In this category cover sjch topics as measurement of lines and plane 
surfaces, properties of polygons, the Pythagorean theorem, and relationships Involving 
circles. Both formal and applied problems are Included, 

Intermediate Algebra. The items In this category cover such topics as dependence and 
variation of quantities related by specific formulas, arithmetic and geometric series, 
simultaneous equations. Inequalities, exponents, radicals, graphs of equations, and 
quandratic equations. 

Number and Numeration Concepts. The Items In this category cover such topics as rational and 
Irrational numbers, set properties and operations, scientific notation, prime and composite 
numbers, numeration systems with bases other than 10, and absolute value. 

Advanced Topics, the Items In this category cover such topics as trigonometric functions, 
permutatlonb and combinations, probability, statistics, and logico Only simple applications 
of the skills Implied by these topics are tested. 



Hathewatlcs 



ERIC 




Table 2 



Means, standard deviations, and sample size 
for ESL and non-ESL examinees 
for Mathematics and English Usage Tests 



raw score SD M 

English Usage (58 items) 

Hex. -Am. ESL 31.50 SM 471 

White non-ESL 39.83" 9.58 1000 

Mathematics (40 items) 

Mex.-Am. ESL 14.90 7.54 471 

White non-ESL 20.79 8.64 1000 
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Table 3 



Mantel-Haenszel 2 Index for Mathematics 
and Snglish Usage Items 

English Usage (75 items) Mathematics Usage (40 items) 





Mex-An ESL 


Mex-Aa ESL 


Item 


vs White IIon'^BSL 


vs White Non- 


I 


-0.02 


-0.03 


2 


-0.16 


— 0 . 33* 


3 


-0.17 


— U.UJ 


4 


0.31* 


0. 17 


5 


0.04 


0 .05 


6 


-0.07 


0.05 


7 


0.16 


0 .02 


8 


0.06 


-0.11 


9 


-0.11 


ft IT 

U.i7 


10 


0.37* 


0 .14 


II 


0.11 


—0 .04 


12 


-0.01 


ft 11 


13 


0.17 


0 .03 


14 


0.23* 


ft m 


15 


-0.14 


_ft ftQ 


16 


-0.01 


—ft 1 Q 


17 


-0.15 


0 .05 


18 


0.01 


—0 .07 


19 


0.17 


ft ft^ 


20 


-0.20* 


_ft no 


21 


-0.21* 


0 .05 


22 


0.09 


0 .00 


23 


0.09 


ft 1 ^ 

U . 14 


24 


0.13 


ft ni 


25 


0.22* 


—0.06 


26 


0.15 


ft 1 "7 
U. 1/ 


27 


0.09 


0 .09 


28 


0.09 


—0 .03 


29 


-0.14 


— ft 1ft 

— U. lU 


30 


0.00 


— n 1 ft 

— U . lU 


31 


0.16 


—0 .01 


32 


0 .13 


ft no 


33 


0.07 


0.06 


34 


0.24* 


0.12 


35 


0.24* 


0 . 16 


36 


—0 .01 


—0 .02 


37 


-0.15 


"0.38* 


38 


0.04 


_ft 1 0 
— U. 1^ 


39 


0.19 


0. 12 


40 


-0.08 


—0.10 


41 


-0.14 




42 


0.00 




43 


0.15 




44 


-0.11 




45 


-0.11 




46 


0.08 




47 


0.00 




48 


0.2a- 




49 


0.20* 




50 


0.22* 




51 


0.27* 




52 


-0.02 




53 


-0.04 




54 


0.11 




55 


0.14 




56 


0.17 




57 


0.17 




58 


-0.35* 




59 


-0.41 




60 


0.08 




61 


0.17 




62 


-0.35* 




63 


0.19 




64 


-0.19 




65 


-0.25* 




66 


0.11 




67 


-0.23* 




68 


0.12 




69 


-0.21* 




70 


-0.05 




71 


0.11 




72 


-0.30* 




73 


0.01 




74 


-0.12 




75 


-0.28* 





Negative values correspond to items that the Non-ESL group found 
easier on the average than did comparible ESL < .oup mcR^ers. 
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Table 4 



Mantel-Haenszel Z Index for the English Usage Test Items 
deleting the last two passage sets 





Mex-Am ESIi 


Item 


vs 


" White Non-BSt. 


X 


-0,05 


2 


-0.18 


•J 

•9 


-0,18 


A 
*t 


0,31* 




0.00 


D 


-0,13 


7 


0,17 


Q 

Cj 


0.01 


Q 


-0,12 


10 


0,33* 


JLX 


0.09 




-O^ll 




0.14 


14 


0.17 


xO 


-0.13 


16 


-0.07 


17 


-0.19 


1 n 
xo 


0.00 




0.11 




-0.25* 


21 


-0.25* 




0.04 


^•9 


0.07 


24 


0.10 


23 


0.15 


2b 


O.U 




0.05 


2a 


0.05 


27 


-0.15 


30 


-0.02 


3X 


0.12 


32 


0.10 


33 


0.04 


J4 


0*18 


OC 
39 


0.21* 


oc 
3d 


-0.05 


3/ 


-0.19 


3ci 


0.04 




0.X9 


40 


-0.14 


41 


-0.25* 


42 


-0.03 


43 


0.07 


AA 

44 


0.06 


43 


-0.17 


46 


0.08 


47 


-0.10 


48 


0.18 


49 


0.13 


50 


-0.02 


51 


0.15 


52 


0.19 


53 


-0.08 


54 


-0.05 


55 


0.09 


56 


0.08 


57 


O.U 


58 


-0.37* 



Negative values correspond to items that the Non-BSr, group found easier 
on the average thaii did comparible BSL group members. 
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FIGURE 2 
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